AITopics | British Columbia

Collaborating Authors

British Columbia

How Neural Reward Models Learn Features for Policy Optimization: A Single-Index Analysis

Higuchi, Rei, Kawata, Ryotaro, Wachi, Akifumi, Takakura, Shokichi, Miyaguchi, Kohei, Suzuki, Taiji

arXiv.org Machine LearningMay-26-2026

Reward modeling is not only a prediction problem: in KL-regularized policy optimization, the learned reward is exponentiated to define the deployed policy, so downstream value depends on errors in reward-tilted regions. We study this feedback in a Gaussian single-index model with $r^*(x) = σ^*(\langle θ^*, x\rangle)$ and $x \sim N(0, I_d)$. We analyze a two-stage neural reward model that first learns the hidden direction $θ^*$ from reward-weighted samples and then fits the readout layer by weighted ridge regression. Exponential reward weighting changes the Hermite signal available to the first layer; for any feature-learning temperature $β_1$ above a dimension-free $O(1)$ threshold, a constant fraction of neurons recover the hidden direction, with weak-recovery complexity governed by the generative exponent. After feature recovery, we derive tilted-policy value-gap bounds for an idealized label-weighted fit with weights $e^{y/β_2}$ and a more practical surrogate-weighted fit with weights $e^{r_{a_0}(x)/β_2}$. Keeping the $β_2$-dependence explicit yields an admissible set of deployment temperatures, balancing the gain from lowering $β_2$ against the learning cost amplified by exponential weighting; in the surrogate-weighted case, proxy-dependent factors shrink this admissible set.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2605.24749

Country:

North America > United States (1.00)
Asia (1.00)
Europe (0.67)
North America > Canada > British Columbia (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

I Went to See What's Happened to the Home of the TED Talk. It Was a Little Terrifying.

SlateMay-8-2026, 09:40:00 GMT

Meanwhile its Audacious Project --a funding initiative that gives mature nonprofits the opportunity to pitch "moonshot" plans to a coalition of philanthropists--has raised over $1 billion in each of the last two years, in an epic Robin Hood operation for a handful of large-scale projects on climate, health, education, and criminal justice: The Audacious recipients here this year are taking this brief break from their work preventing 16 million unsafe abortions, helping governments in 20 countries prevent lead poisoning, or intercepting 5 percent of the world's river-borne plastic before it reaches the ocean.

advertisement, artificial intelligence, social media, (12 more...)

Slate

Country:

North America > United States > California (0.14)
North America > Canada > British Columbia (0.14)

Genre: Financial News (0.34)

Industry: Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (0.69)

Technology:

Information Technology > Communications > Social Media (0.48)
Information Technology > Artificial Intelligence > Robots (0.48)

Add feedback

Canadian officials claim OpenAI violated federal and provincial privacy laws

EngadgetMay-6-2026, 21:24:34 GMT

Philippe Dufresne, the Privacy Commissioner of Canada, has found OpenAI was not compliant with Canadian federal and provincial privacy laws in the training of its AI models. Following an investigation, Dufresne and his counterparts in Alberta, Quebec and British Columbia say OpenAI's approach to things like data collection and consent stepped on multiple laws, including Canada's Personal Information Protection and Electronic Documents Act (PIPEDA), which governs how companies collect and use personal information during the normal course of business. The commissioners participating in the investigation identified multiple privacy issues with OpenAI's approach, including that the company gathered vast amounts of personal information without adequate safeguards to prevent use of that information to train its models, and that it failed to acquire consent to collect and use that personal information in the first place. Warnings in ChatGPT note that interactions with the AI could be used in training, but third-party data OpenAI has purchased or scraped also includes personal details people likely aren't even aware of. The fact that ChatGPT users have no way to access, correct or delete that data was another issue that the commissioners identified, according to a summary of the investigation's findings, along with OpenAI's lackluster attempts to acknowledge the inaccuracy of some of ChatGPT's responses.

large language model, machine learning, natural language, (13 more...)

Engadget

Country:

North America > Canada > Quebec (0.26)
North America > Canada > British Columbia (0.26)
North America > Canada > Alberta (0.26)

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment > Games > Computer Games (0.72)
Government > Regional Government > North America Government > Canada Government (0.56)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Recovery Analysis for Plug-and-Play Priors using the Restricted Eigenvalue Condition

Neural Information Processing SystemsMay-1-2026, 02:04:25 GMT

The plug-and-play priors (PnP) and regularization by denoising (RED) methods have become widely used for solving inverse problems by leveraging pre-trained deep denoisers as image priors. While the empirical imaging performance and the theoretical convergence properties of these algorithms have been widely investigated, their recovery properties have not previously been theoretically analyzed. We address this gap by showing how to establish theoretical recovery guarantees for PnP/RED by assuming that the solution of these methods lies near the fixedpoints of a deep neural network. We also present numerical results comparing the recovery performance of PnP/RED in compressive sensing against that of recent compressive sensing algorithms based on generative models. Our numerical results suggest that PnP with a pre-trained artifact removal network provides significantly better results compared to the existing state-of-the-art methods.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > California (0.28)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

03df5246cc78af497940338dd3eacbaa-Paper-Conference.pdf

Neural Information Processing SystemsMay-1-2026, 01:32:05 GMT

artificial intelligence, machine learning, perturbation, (18 more...)

Neural Information Processing Systems

Country:

Europe (0.92)
Asia (0.68)
North America > Canada > British Columbia (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Security & Privacy (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Families sue OpenAI, alleging chatbot aided in Canadian school shooting

Al JazeeraApr-29-2026, 17:33:51 GMT

The families of victims of a school shooting in a remote Canadian Rockies town are suing artificial intelligence company OpenAI in a United States federal court, alleging that the ChatGPT maker failed to alert police to the shooter's alarming interactions with the chatbot. A lawsuit filed on Wednesday on behalf of 12-year-old Maya Gebala, who was critically injured in the February shooting, is among the first of more than two dozen cases from families in Tumbler Ridge, British Columbia, in what their lawyers say represents "an entire community stepping forward to hold OpenAI accountable". The cases represent the families of the five slain children targeted in the school shooting. Those include Zoey Benoit, Abel Mwansa Jr, Ticaria "Tiki" Lampert, Kylie Smith, all 12, and Ezekiel Schofield, 13, as well as education assistant Shannda Aviugana-Durand. Jesse Van Rootselaar, whose interactions with ChatGPT are at the centre of the lawsuits, shot her mother and stepbrother at home before killing an educational assistant and five students aged 12 to 13 at her former school on February 10, according to police.

large language model, lawsuit, machine learning, (12 more...)

Al Jazeera

Country:

North America > Canada > British Columbia (0.25)
North America > United States > California (0.15)

Industry:

Law (1.00)
Education > Health & Safety > School Safety & Security > School Violence (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.89)

Add feedback

Victims Allege OpenAI Is Responsible for Mass Shooting

Mother JonesApr-29-2026, 12:15:00 GMT

A new lawsuit underscores key questions about the Tumbler Ridge killer's use of ChatGPT. A community vigil in Tumbler Ridge two days after the rural community experienced one of Canada's deadliest shootings Paige Taylor White/AFP/Getty Get your news from a source that's not owned and controlled by oligarchs. Victims of the Tumbler Ridge mass shooting and their families sued OpenAI and its CEO, Sam Altman, in US district court in San Francisco on Wednesday, claiming various negligence, product liability, and other violations. The civil complaints are the latest in a wave of litigation against OpenAI alleging that its globally popular chatbot, ChatGPT, helped people commit lethal violence. The complaints were filed by families of multiple victims wounded and killed at Tumbler Ridge Secondary School in British Columbia, Canada, where a suicidal 18-year-old opened fire on February 10.

large language model, machine learning, natural language, (18 more...)

Mother Jones

Country:

North America > United States > California > San Francisco County > San Francisco (0.24)
North America > Canada > British Columbia (0.24)

Industry:

Law > Litigation (1.00)
Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.90)

Add feedback

Generating multivariate time series with COmmon Source CoordInated GAN (COSCI-GAN)

Neural Information Processing SystemsApr-27-2026, 22:51:12 GMT

Generating multivariate time series is a promising approach for sharing sensitive data in many medical, financial, and IoT applications. A common type of multivariate time series originates from a single source such as the biometric measurements from a medical patient. This leads to complex dynamical patterns between individual time series that are hard to learn by typical generation models such as GANs. There is valuable information in those patterns that machine learning models can use to better classify, predict or perform other downstream tasks. We propose a novel framework that takes time series' common origin into account and favors channel/feature relationships preservation. The two key points of our method are: 1) the individual time series are generated from a common point in latent space and 2) a central discriminator favors the preservation of inter-channel/feature dynamics. We demonstrate empirically that our method helps preserve channel/feature correlations and that our synthetic data performs very well in downstream tasks with medical and financial data.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance (0.88)
Health & Medicine > Government Relations & Public Policy (0.68)
Health & Medicine > Health Care Providers & Services > Reimbursement (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

artificial intelligence, machine learning, pruning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (1.00)
North America > Canada > British Columbia (0.28)

Genre:

Contests & Prizes (0.50)
Research Report (0.46)

Industry: Leisure & Entertainment > Gambling (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.82)

Add feedback

Learning State Representations from Random Deep Action-conditional Predictions

Neural Information Processing SystemsApr-27-2026, 00:28:22 GMT

Our main contribution in this work is an empirical finding that random General Value Functions (GVFs), i.e., deep action-conditional predictions--random both in what feature of observations they predict as well as in the sequence of actions the predictions are conditioned upon--form good auxiliary tasks for reinforcement learning (RL) problems. In particular, we show that random deep action-conditional predictions when used as auxiliary tasks yield state representations that produce control performance competitive with state-of-the-art hand-crafted auxiliary tasks like value prediction, pixel control, and CURL in both Atari and DeepMind Lab tasks. In another set of experiments we stop the gradients from the RL part of the network to the state representation learning part of the network and show, perhaps surprisingly, that the auxiliary tasks alone are sufficient to learn state representations good enough to outperform an end-to-end trained actor-critic baseline.

machine learning, natural language, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Filters

Collaborating Authors

British Columbia

How Neural Reward Models Learn Features for Policy Optimization: A Single-Index Analysis

I Went to See What's Happened to the Home of the TED Talk. It Was a Little Terrifying.

Canadian officials claim OpenAI violated federal and provincial privacy laws

Recovery Analysis for Plug-and-Play Priors using the Restricted Eigenvalue Condition

03df5246cc78af497940338dd3eacbaa-Paper-Conference.pdf

Families sue OpenAI, alleging chatbot aided in Canadian school shooting

Victims Allege OpenAI Is Responsible for Mass Shooting

Generating multivariate time series with COmmon Source CoordInated GAN (COSCI-GAN)

Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets

Learning State Representations from Random Deep Action-conditional Predictions